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Abstract 



The recent work of ||4][TT] rigorously proved (in a large dimensional and statistical context) that if the 
number of equations (measurements in the compressed sensing terminology) in the system is proportional 
to the length of the unknown vector then there is a sparsity (number of non-zero elements of the unknown 
vector) also proportional to the length of the unknown vector such that ^i -optimization algorithm succeeds 
yi ' in solving the system. In more recent papers ||39] 14111431 we considered under-determined systems with 

the so-called block-sparse solutions. In a large dimensional and statistical context in [39l we determined 
lower bounds on the values of allowable sparsity for any given number (proportional to the length of the 
unknown vector) of equations such that an ^2/^1 -optimization algorithm succeeds in solving the system. 
These lower bounds happened to be in a solid numerical agreement with what one can observe through 
^ , numerical experiments. Here we derive the corresponding upper bounds. Moreover, the upper bounds that 

Q I we obtain in this paper match the lower bounds from [39 1 and ultimately make them optimal. 
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1 Introduction 

X 

^ . In last several years the area of compressed sensing has been the subject of extensive research. Find- 

ing the sparsest solution of an under-determined system of linear equations is one of the focal points of 
the entire area. Phenomenal results of |Sl[Il] rigorously proved for the first time that in certain scenar- 
ios one can solve an under-determined system of linear equations by solving a linear program in poly- 
nomial time. These breakthrough results then as expected generated enormous amount of research with 
possible applications ranging from high-dimensional geometry, image reconstruction, single-pixel camera 
design, decoding of linear codes, channel estimation in wireless communications, to machine learning, data- 
streaming algorithms, DNA micro-arrays, magneto-encephalography etc. (more on the compressed sensing 
problems, their importance, and wide spectrum of different applications can be found in excellent refer- 
ences |[Il|9l[I2l29llil^i8.iQj). 

In this paper we wiU be interested in solving the following under-determined system of linear equations 

^x = y (1) 

where ^ is an M x A^ (M < N) measurement matrix and y is an M x 1 measurement vector Moreover, 
we will be interested in finding the i^-sparse solution x. Under A'-sparse we will in the rest of the paper 
consider vectors that have at most K nonzero components. Also throughout the rest of the paper we will 
often refer to the K-sparse solution of © simply as the solution of ([T]). Further, we will consider ideally 
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sparse signals; more on the so-called approximately sparse signals can be found in e.g. QISTl. We will 
also assume the so-called linear regime, i.e. we will assume that K = f3N and that the number of the 
measurements is M = aN where a and /3 are absolute constants independent of A^. 

A particularly successful approach to solving ([T) assumes solving the following ^i -optimization problem 

min ||x||i 
subject to Ax = y. (2) 

While the Uterature on using (|2]i to solve ([T} is rapidly growing below we restrict our attention to two, in our 
view, the most influential recent works HUT]. Quite remarkably, for certain statistical matrices A in HlfTTTl 
the authors were able to show that if a and A^ are given then any unknown vector x with no more than 
K = /3N (where (3 is an absolute constant dependent on a and explicitly calculated in EKTTl) non-zero 
elements can be recovered by solving ©. As expected, this assumes that y was in fact generated by that x 
and given to us. (More on practically very important scenario when the available measurements are noisy 
versions of y can be found in seminal works ||4]|49] as well as in recent developments e.g. ll35l - [37l .) 

2 Block-sparse signals and ^2/^1 -algorithm 

What we described in the previous section assumes solving an under-determined system of linear equations 
with a standard restriction that the solution vector is sparse. Sometimes one may however encounter ap- 
plications when the unknown x in addition to being sparse has a certain structure as well. The so-called 
block-sparse vectors are such a type of vectors and will be the main subject of this paper. These vectors and 
their potential applications and recovery algorithms were investigated to a great detail in a series of recent 
references (see e.g. lITI 151 [T5T4T71I20] 1321141114311441 '). A related problem of recovering jointly sparse vectors 
and its apphcations were also considered to a great detail in e.g. ll2ll3ll6l[8l lT8ll30ll3Tll45ti48ll53ll54l and many 
references therein. While various other structures as well as their applications gained significant interest 
over last few years we here refrain from describing them into fine details and instead refer to nice work of 
e.g. Il261l271l33ll52l . Since we will be interested in characterizing mathematical properties of solving linear 
systems that are similar to many of those mentioned above we just state here in brief that from a mathe- 
matical point of view in all these cases one attempts to improve the recoverability potential of the standard 
algorithms (which are typically similar to the one described in the previous section) by incorporating the 
knowledge of the unknown vector structure. 

To get things started we first introduce the block-sparse vectors. The subsequent exposition will also be 
somewhat less cumbersome if we assume that integers A^ and d are chosen such that n = ^ is an integer and 
it represents the total number of blocks that x consists of. Clearly d is the length of each block. Furthermore, 
we will assume that m = ^ is an integer as well and that Xj = X(j_i)^^i.j£;, 1 < i < n are the n blocks of 
X (see Figure [Hi. Then we will call any signal x k-block-sparse if its at most k = ^ blocks Xj are non-zero 
(non-zero block is a block that is not a zero block; zero block is a block that has all elements equal to zero). 
Since fc-block-sparse signals are iiT-sparse one could then use ^ to recover the solution of dl}. While this 
is possible, it clearly uses the block structure of x in no way. To exploit the block structure of x in Il44l the 
following polynomial-time algorithm (essentially a combination of £2 and £1 optimizations) was considered 
(see also e.g. ffl[l6l|47l|53|5l) 

n 

ST^ II II 

min 2^ \\^{i~l)d+l:id\\2 

i=l 

subject to Ax. = y. (3) 

Extensive simulations in ll44ll demonstrated that as d grows the algorithm in (|3]l significantly outperforms the 
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Figure 1: Block-sparse model 



standard £i. The following was shown in |44| as well: let yl be an M x A^ matrix with a basis of null-space 
comprised of i.i.d. Gaussian elements; if a = ^ — )• 1 then there is a constant d such that all fc-block-sparse 
signals x with sparsity K < (3N, /3 — ;• i, can be recovered with overwhelming probability by solving ^. 
The precise relation between d and how fast a — > 1 and /3 — > ^ was quantified in |44| as well. In l.4L.43j] 
we extended the results from ll44l and obtained the values of the recoverable block-sparsity for any a, i.e. 
for < a < 1. More precisely, for any given constant < a < 1 we in 114 111431 determined a constant 
(3 = % such that for a sufficiently large d Q with overwhelming probability recovers any /c -block-sparse 
signal with sparsity less then K. (Under overwhelming probability we in this paper assume a probability 
that is no more than a number exponentially decaying in N away from 1.) 

Clearly, for any given constant a < I there is a maximum allowable value of /3 such that for any given 
/c-sparse x in ([Hi the solution of Q is with overwhelming probability exactly that given /c-sparse x. We will 
refer to this maximum allowable value of /5 as the strong threshold (see II111I39II ). Similarly, for any given 
constant q < 1 and any given x with a given fixed location and a given fixed directions of non-zero blocks 
there will be a maximum allowable value of /3 such that (O finds that given x in ([T]) with overwhelming 
probability. We will refer to this maximum allowable value of (3 as the weak threshold and will denote it by 
(3^ (see, e.g. l40l|42J). 

While II41II43II provided fairly sharp strong threshold values they had done so in a somewhat asymptotic 
sense. Namely, the analysis presented in 114111431 assumed fairly large values of block-length d. As such 
the analysis in flni43l then provided an ultimate performance limit of ^2/^1 -optimization rather than its 
performance characterization as a function of a particular fixed block-length. 

In our own work ||39l we extended the results of 1141 11431 and provided a novel probabilistic framework for 
performance characterization of <^ through which we were finally able to view block-length as a parameter 
of the system (the heart of the framework was actually introduced in [42] and it seemed rather powerful; in 
fact, we afterwards found hardly any sparse type of problem that the framework was not able to handle with 
almost impeccable precision). Using the framework we obtained lower bounds on /3^. These lower bounds 
were in an excellent numerical agreement with the values obtained for (3^ through numerical simulations. 



One would therefore be tempted to believe that our lower bounds from 11391 are tight. In this paper we design 
a mechanism that can be used to compute the upper bounds on P^j (as it was the case with the framework 
of |39|, the new framework does not seem to be restricted in any way to the ^2/^1 type of sparsity). The 
obtained upper bounds will match the lower bounds computed in [39] and essentially make them optimal. 
We should also point out that in our recent work ||38l we created results similar in flavor to those that we will 
present here but are valid for general under-determined systems with sparse solutions (i.e. not necessarily 
those with block-sparse solutions). When viewed in that context the results presented here are a block 
analogue to those presented in [38|. 

Before going through the details of our own approach we briefly take a look back and mention a few 
other known approaches from a vast literature cited above that have recently attracted significant amount of 
attention. The first thing one can think of when facing the block-structured unknown vectors is how to extend 
results known in the non-block (i.e. standard) case. In [46] the standard OMP (orthogonal matching pursuit) 
was generalized so that it can handle the jointly-sparse vectors more efficiently and improvements over the 
standard OMP were demonstrated. In |[l][T7l algorithms similar to the one from this paper were considered. 
It was explicitly shown through the block-RIP (block-restricted isometry property) type of analysis (which 
essentially extends to the block case the concepts introduced in [4J for the non-block scenario) that one can 
achieve improvements in recoverable thresholds compared to the non-block case. Also, important results 
were obtained in ifTSl where it was shown (also through the block-RIP type of analysis) that if one considers 
average case recovery of jointly-sparse signals the improvements in recoverable thresholds over the standard 
non-block signals are possible (of course, trivially, jointly-sparse recovery offers no improvement over the 
standard non-block scenario in the worst case). All these results provided a rather substantial basis for belief 
that the block-sparse recovery can provably be significantly more successful than the standard non-block one 
(as mentioned above, extensive simulations in |44| confirmed such expectations). In |39] we then provided 
further results in this direction and here we establish their optimality. 

We organize the rest of the paper in the following way. In Section[3]we introduce two key theorems that 
will be the heart of our subsequent analysis. In Section |4] we create the mechanism for computing the upper 
bounds on /3^. Finally, in Section [5] we discuss obtained results. 

3 Key theorems 

In this section we introduce two useful theorems that will be of key importance in our subsequent analysis. 
First we recall on a null-space characterization of A that guarantees that the solution of © is the A;-block- 
sparse solution of (dJ. Moreover, the characterization will establish this for any /3n-block-sparse x with a 
fixed location and a fixed combination of directions of nonzero blocks. Since the analysis will clearly be 
irrelevant with respect to what particular location and what particular combination of directions of nonzero 
blocks are chosen, we can for the simplicity of the exposition and without loss of generality assume that the 
blocks Xi,X2, . . . ,X„_fc of X are equal to zero and the blocks X„_fc4.i,X„_fc+2; • • • j^n of X have fixed 
directions. Moreover, (as mentioned earlier) throughout the paper we will call such an x /c -block-sparse. 
Under this assumption we have the following theorem from 11411 that provides such a characterization (simi- 
lar characterizations that relate to the non-block case can be found in II121I131[T91I281I441|51 115511 ; furthermore, 
if instead of £1 one, for example, uses an f ^-optimization (0 < g < 1) in © then characterizations similar 
to the ones from ||I2[l3][l9l|22ffi4ll2H]|M]inilS3 can be derived as well. 



Theorem 1. (Nonzero part of x has fixed directions and location) Assume that an dm x dn matrix A is 
given. Let ii.be a k-block-sparse. Also let Xi = X2 = • • • = X„_fc = and let the directions of vectors 
^n-fc+i) ^n-fc+2) • • • 1 ^n be fixed. Further, assume that y = j4x and that w is an dn y. 1 vector with 



blocks Wj, i = 1, . . . , n, defined in a way analogous to the definition of blocks Xj. If 



n—k 



(Vw e /J-^^lvdw = 0) - V ^^ < V ||Wi||2. (4) 

i=n— fc+1 «=1 



f/je« the solution o/(121) J^ x. Moreover, if 



xfw,- "-' 



(3wG/?'^"|Aw = 0) - V l^i-Ili>'^||Wi||2. (5) 

^-"^ LX.J 2 ^"^^ 

i=n—k+l j=l 

f/je« there will be a k-block-sparse xfrom the above defined set that satisfies (|7|) and is not the solution of 

(ED- 

Proof The first part follows directly from Corollary 2 in fSOl. The second part follows by combining 
(adjusting to the block case) the first part and the ideas of the second part of Theorem 1 in |[38l . D 

Before proceeding further we would like to make a point similar to the one we have made in f38l. In our 
opinion the first part of the theorem that was put forth in [39 1 (and in essence in [,40] ) is the unsung hero of 
all the success achieved in the thresholds analysis through various frameworks that we eventually designed. 
As mentioned in ll38l . it was fist recognized in [|40l that characterizations of the type given in the first part 
of the above theorem could lead to the optimal threshold performance. As it became later clear the analysis 
in [I40 [ stopped somewhat short of the ultimate goal and it achieved only a moderate success in performance 
characterization of ^1 -optimization. While the analysis of ll42l formally completed the task of evaluating 
fairly precisely the achievable thresholds it is the first part of the above theorem (or rather its a non-block 
equivalent from [|40l ) that made everything possible. Along the same lines, while the framework created 
in [142 1 was good enough to fairly precisely evaluate the achievable thresholds it is the first part of the above 
theorem that made the block generalization of results from [42] possible in [39|. 

Now, with regard to the second part of the above theorem, the story is of course similar. Its proof is 
rather simple and in fact almost completely follows the ideas of the non-block case (see, e.g. [T3l|22l|38l). It 
is just that we never presented it before. Basically, we did not find the second part of the theorem to be of any 
(let alone much) use if one were to create the lower bounds on the thresholds. However, as the reader might 
guess, if one is concerned with proving the upper bounds the second part of the above theorem becomes the 
same type of the unsung hero that the first one was for the success of the framework of (39). Below we use 
it to create a machinery as powerful as the one from [39] that provides the corresponding framework for 
upper-bounding the thresholds. 

Before moving to the design of the framework, we would also like to say a few words about a possible 
design of the matrix A that would satisfy the conditions of Theorem [T] Designing matrix A such that ^ 
holds would not be that hard. The problem is that one does not know a priori which k blocks of x will be 
nonzero and which directions they will have. That would essentially force one to design A such that dH) 
holds for any subset of {1, 2, . . . , n} of cardinality k and any combination of directions on that subset. If 
one assumes that m and k are proportional to n (the case of our interest in this paper) this is an enormous 
combinatorial task and the construction of such a deterministic matrix A is clearly not easy (in fact, as 
observed in e.g. [39 . 42 1 one may say that its a non-block counterpart is one of the most fundamental open 
problems in the area of theoretical compressed sensing; more on an equally important inverse problem of 
checking if a given matrix satisfies the condition of Theorem [T]for any subset of {1, 2, . . . , n} of cardinality 
k and any combination of block directions, the interested reader can find in |[T0ll25l ). On the other hand, 
turning to random matrices significantly simplifies things. As we will see later in the paper, Gaussian random 
matrices A will turn out to be a very convenient choice. The following phenomenal result from [1211 that 



relates to such matrices will be the key ingredient in the analysis that will follow. 

Theorem 2. ( 112 1 V ) Let Xij and Yij, 1 < i < n,l < j < m, be two centered Gaussian processes which 
satisfy the following inequalities for all choices of indices 

1. E{Xf^) = E{Y^j) 

2. E{XijXik) = E{YijYik) 

3. E{XijXik) = E{YijYik),i^l. 
Then 

i j i 3 



4 Upper-bounding 13^ - general x 

In this section we probabilistically analyze validity of the null-space characterization given in the second 
part of Theorem [T] Essentially, we will design a mechanism for computing upper bounds on 13^ (in fact, 
since it will be slightly more convenient we will actually determine lower bounds on a; that is of course 
conceptually the same as finding the upper-bounds on j3). 

We start by defining a quantity r that will play one of the key roles below 

n-k n -vT^ju- 

r(^)=min E||W,|b+ Y. WIt) 

— — -^i 2 

i=l i=n—k+l 

subject to Aw = 

||w||2 < 1. (6) 

Now, we will in the rest of the paper assume that the entries of A i.i.d. standard normal random variables. 
Then one can say that for any a and /? for which 

lim P{t{A) < 0) = 1, (7) 

there is a fc-block-sparse x (from a set of x's with a given fixed location of nonzero blocks and a given fixed 
combination of their directions) which Q with probability 1 fails to find. For a fixed /3 our goal will be to 
find the largest possible a for which ([7]) holds, i.e. for which ^ fails with probability 1. 

Before going through the randomness of the problem and evaluation of P{t{A) < 0) we will try to 
provide a more explicit expression for r than the one given by the optimization problem in Q. As a first 
step we write the Lagrange dual of Q over w 

n~k n jr n 

r(A)=maxmin (V ||Wi||2 + V ^h^) + z^^^w + 7"^ ||Wif - 7) 

i=\ i=n—k-\-\ i=\ 

subject to 7 > 0. (8) 

To simplify the exposition we set 

V'^*^ = v^ Ai,n - k + 1 <i <n- k 

"V". 

V'^*-' = 71 — TT + '^'^Ai, n-k + l<i<n, (9) 



and assume that V'j , 1 < J < d, is the j-th component of xp^'^'. Then one can rewrite dSj in the following 
way 



t(A) = maxmin (V ||Wi||2 + V V^^^Wj + 7^ ||Wi|p - 7) 

!/,7 w ■^ ' ■^ ' ^ ' 

j=l j=l i=l 

subject to 7 > 0. (10) 



Let 



n—k 



/i(i/,7, w) = J^ ||W,||2 + ^ V'^'^W, + 7^ ||W,f - 7. (11) 

1=1 j=i j=i 

We then proceed by solving the inner minimization in (ITOl i. Since /i(-) is convex in w we simply find the 
optimal w by equaling the derivative of /i(-) with respect to w to zero. We then have 

d/i(^,7,w) 



dW,- 



Tpi + 27 Wj = 0,n-fc + l<i<n, (12) 



where O's are obviously d-dimensional row vectors of all zeros. At this point we will make an assumption 
that the above system can be solved. If it indeed can be solved the solution must satisfy 

Wf(27 + — ^) = -V'»,l<i<n-fc, (13) 

II VVj||2 

and one would have 

2||Wi||27 + l = U^''^hA<i<n-k. (14) 

After plugging the value of || Wj||2, 1 < i < n — k, back in (IT3] ) we have 

Wf = -V;«MiJ^^, l<^<n-k. (15) 

27||V'W||2 ' - - 

We should now note that for any i G {1, . . . ,n — k}, Wj from dTSl ) is indeed the solution of (fT2l ) if 
ll^*^*-* II2 > 1- Otherwise one has Wj = (here obviously stands for a column vector of d zeros). On the 
other hand, from the second set of equations in (IT2l ) one easily has 

Wf = --!^,n-k + l<i<n. (16) 

27 

Plugging the results from ([15] ) and (|T6l ) back in (ITTI ) we obtain 

mmA(z.,7, w) = ^ ^- - _^ HV'^ ^2 ^ + )_^ 4^ " 

i=n-k+l ' 



where 



',(») I 



l>o 



1 if llV'^'^lb-i > 



otherwise 



Transforming [T7] further we have 

min/i(z/,7,w) = 

w 

A combination of (flOl ) and ( fT9l ) gives 



n-k ,111 ,r,-^ii -I I n2 



E 

4 = 1 



(|||^W| 



»lli 



71 — fc 

-E 

j=i 
subject to 7 > 0. 



t{A) = max 



47 



(|||V^(^^||2-1|>0)^ 

47 



^^ 47 



E 

i=n—k+l 



i)\\2 
II2 



47 
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After solving over 7 we finally have 

t{A) = — min , 



n—k 






l|>oP 



i=n—k+l 



W||i. 



The following rather small trick in rewriting the previous equation turns out to be useful 



t{A) = — min 



n—k 



i=n—k+l 

subject to < Zi < 1,1 < i < n — k. 






«iil 



Now we set 



f2{j^,A,z) 



n—k 



\j i=l 



i=n— fc+l 



'Il2- 



(18) 



7- (19) 



(20) 



(21) 



(22) 



(23) 



Let 0j, n — k + l<i<n,hedxd unitary matrices such that ©jXj = [1, 0, 0, 0, ... , 0]^. Further let 
Q(^) G ^(n-fc+dfc)x(dn) ^^g ^ matrix that has zeros everywhere except 



Q 



(1) 



i,d{i-l)+l:di 



^W 



^d(i-i)+i:di,d(i-i)+i:di = Qi,n- k + 1 <i <n, 
and let ||qi||2 = 1,1 < i < n — k. One can then write 

f2{iy, A, z) = max a^{Q^^\A^u - x^^)) - z^^)), 



(24) 



(25) 



where x(i) G i?'^", z(i) G Ru-k+dk^ ^ ^ ^n-k+dk^ ^^^ 



|a||2 


= 1 


x<" 


= 0, l<i<n-A; 


x!" 


= J, (i(n — A:) + 1 < i < (in 


^!" 


= Zj, 1 < i < (n — /c) 


z'" 


= 0,n-A; + l<i<n-A; + (iA;. 



(26) 
Now let Q G ji{n-k+dk)x{dn) ^g ^ matrix that has zeros everywhere except 

Qi4(i-i)+i:di = <ii,'i^<i<n- k 

Qd{n-k)+l:dn,d{n-k)+l:dn = -^) (27) 

and again as above ||qi||2 = 1, 1 < i < n-k. Furthermore, since Q^^^^_^)^^.ak4(n-k)+i:dnMn-k)+i:dn,i:dm 
has the same distribution as ^(i(n-fc)+i;dn,i:dm for the statistical purposes that we will consider later in the 
paper one can rewrite (l25T l in the following way 



f2{i^,A,z) = maxa^(Q(^^i/-x(^)) -z^^)), (28) 

a 

where z^^) G i?'^" and a G 11"--^+'^^ are as above and x^^) G R'^'^ has zeros everywhere except 

x^^^^ = l,n-k<i<n-l. (29) 

At this point we are almost ready to switch to the probabilistic aspect of the analysis. To that end we do the 
last piece of transformation. Namely, we set z^^-' = x^^^ + z*-^^ and rewrite (|49] | as 

t{A) = — mill max a QA u — a tP'' 

z(2),i/ ||a|J2=l,qi 

subject to < Zj < 1, 1 < z < n — A: 

(2) 
^n-fc+d(i-l)+2;n-fc+dj ^ Ol:d-l' 1 < "« < ^ 

^n-fc+d(i-i)+i = 1, 1 < * < A; 

llqilb = 1,1 < i <ra-A; (30) 

where Oi;d_i is a vector of (i — 1 zeros. Also, we will call Z the set of all z^^^ that are feasible in (|30] |. Now 
we are ready to invoke the results from Theorem We do so through the following modification of the 
corresponding lemma from [38] which itself is a slightly modified version of Lemma 3.1 from [211 (Lemma 
3.1 is of course a direct consequence of Theorem |2]and the backbone of the escape through a mesh theorem 
utihzed in |42|). 

Lemma 1. Let A he an dm x dn matrix with i.i.d. standard normal components. Let g and h.he dn >< \ 
and dm x 1 vectors, respectively, with i.i.d. standard normal components. Also, let g be a standard normal 
random variable and let Z and Q be as defined above. Then 



dm 

P{ min max {aL^QA^i^+\\u\\2g-Q^(2) ^) > 0) > Pi min max (||i/||2a'^Qg+V hizyi-Caz(2) ^) > 0). 

(31) 



Proof. The proof is exactly the same as the one of the corresponding lemma from |[38l (or for that matter 
as the one of Lemma 3.1 in |21 1). The only difference is that in current context a^Q plays the role that a^ 
played in the corresponding lemma in ||38l . D 

Let Caz(2) u ~ £5 v^ll^lb + a^z^^-' with £5 > being an arbitrarily small constant independent of 
n. Then the left-hand side of the inequality in (|3T]) is then the following probability of interest 

dm 

P( min max (||i/||2a'^Qg + VhiZ^j - el^^\/d^||z/||2 - a^z^^)) > 0). 

z(2)ez,^ei?'''"\o||a||2=i,Q ^ 

After solving the inner maximization over a and Q and pulling out ||i^||2 one has 

^ dm 

P{ min (||g- z(2)||2 + yhi/^-e^^)\/d;^)>0), 

where g = [g(i), g(2), • • • , g(n-fc), gd(n-fc)+i, gd(n-fe)+2, • • • , gdn]^, whcrc [g(i), g(2), • • • , g(n-fc)] are mag- 
nitudes of vectors [gi:d, gd+i:2d, • • • , gd(n-A;)+i:d(n-fc)] sortcd in increasing order. Minimization of the sec- 
ond term then gives us 

P{ min (||g-^z(2)||2) > ||h||2 + e^^)\/d^). (32) 



Since h is a vector of dm i.i.d. standard normal variables it is rather trivial that P(||h||2 < (1+e^ )\/dm) > 
1 — e~"^2 '^"^ where e\^ > is an arbitrarily 
independent of n. Then from (|32] | one obtains 



1 — e "^2 '^"^ where e\^ > is an arbitrarily small constant and eg is a constant dependent on e^™ but 



P{ min (||g-^^z(2)||2) > ||h||2 + e^^^Vd^) 

z(2)ez,i/efl'^™\o ||i^||2 

> (1 - e-^2'"''^™)P( min {\\g- -^2.^'^^^) > {I + e^^'^)Vd^ + ei^Vdn)). (33) 



To make results as parallel as possible to the ones created in pQl we will now set G* = ^d(n-k+i-i)+2:di^n-k+i) j 
1 <i <k and 

G = [g(l),g(2), • • • ,g(n-fc),gd(n-fc)+l,gd(n-fc+l)+l, ■ ■ ■ ,gd(n-l)+l, l|Gi ||2, IIG2 ||2, • • • , ||G^ ||2, ] • 

(34) 
Moreover, we will set 

Zg = {z(^)|0<zf^ < 1,1 <i<n-k,zf^ = l,n-k + l <i<n,zf^ =0,n + l <i <n + k}. (35) 

One can then rewrite (|33T l in the following way 

P( min (||G-^^z(2)||2) > ||h||2 + e^^^\/d^) 

z(2)eZG,i^eiJ*"\o \Wh 

> (1 - e-^2'"''^'»)P( min {\\G - j^z'-^^h) > {I + e^r^)Vd^ + ei-^^Vd^)). (36) 

z(2)eZG,!^e-R.'*™\o Iklb 
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The optimization on the right-hand side of (1331 ) is structurally the same as the one in equation (16) 
in ||39^ (actually to be more precise it is the same as the weak threshold equivalent to (16)). Essentially, the 
exact equivalence between these optimizations is achieved after in (16) from |39| H is replaced by G, v 
is replaced by -rnjip, A is restricted to the lower (n — k) components, and after one additionally notes that 

in (16) from 1 39 1 < A < i^, which corresponds toO<Zj- < 1,1 < i < n — k introduced above 
(that way one would in essence obtain the weak threshold equivalent to (16); this was not explicitly written 
anywhere in lf39l but is rather obvious; in [39| we, instead, made a "weak" equivalence to its (30)). With 
these replacements one can then use the machinery of f39l| to establish 



mm ( 

z(2)6Zg,i/6-R*"\0 



M\2 



.(2)1 



\ 



n+k 

E 

=c™+l 



G? 



((G^z(2))-n=iG. 



n 



V^ 



G\Cu 



(37) 



where c,„ is the solution of 



(GT^(2))_^e.^G. 



n 



(38) 



As a side remark, we should point out that the key point to the success of our method is that the derivation 
of ||39l establishes the equality in (l37l) . It is just that in ||39l only the "smaller than" inequality part of this 
equality was utilized. At this point we have established the core of our upper-bounding arguments. The 
rest is just a slightly modified repetition of the derivations from [39] (or one may think of them as a block 
parallelization of the derivations presented in ||38l ) so that we can make everything precise. 

First we will define two quantities ciJ and Cw as the solutions of the following two equations: 



(0 



l-6r^)i?((G^z(2))-^^»^G,) 



n 



c^'^ 

^w 



:i+6(^))^((gV2))-e£g. 



n 







1 + e^^ ' 



0. 



(39) 



where F^ ^(•) is the inverse cdf of the chi random variable with d degrees of freedom (x^), and ef' > 
0, 1 < i < 2 are arbitrarily small constants independent of n. It follows then directly from the derivation 
(33) - (44) in |3l that 



.W 



.(^) 



(0 A^) 



})>l 



(c) 



(0 Sn) 



(40) 



where eg is a constant dependent on e^ > 0, 1 < i < 2, Cw , Cw but independent of n. We now set 
Cw = c"w and focus on (|37] ). Concentration analysis machinery of 109^1 will help us establish a "high 
probability" lower bound on faicw) (this will amount to nothing but reversing the concentration arguments 
that we have established in [39 1; concentration arguments are of course easy to reverse; what was harder to 
reverse was the part before (l37i ). We now split fcicw) into two parts i.e. 



/g(c^) = /g (c«>) - /g (Ct«), 



(41) 



where f^'{cw) 



Now, /g)(c. 



EILc^+iGf and Pf^^cw) = (G^z(2)) - ^^^^ G,). Now, f^'[Cw) concentrates 
trivially, the argument is the same as the one that can be established when Cw = (alternatively one can 
repeat derivation (42) from [42] to obtain the Lipschitz constant and combine it with Lipschitz concentration 
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formula (36) also in ||39l ). So we have 



P{/g\c^) > (1 - e['^)Ef^^\c^)) > 1 - e-4''-, 



(42) 



again as usual e^^ > is an arbitrarily small constant and £3 is a constant dependent on ef and c„, but 
independent of n. On the other hand, concentration of fQ^{cw) follows by reversing (43) from ||39l . i.e. 



PifSHc^) > (1 + e'i^)Ef^^\c^)) > 1 - e-^4^'" 



(43) 



where again as usual £3 > is an arbitrarily small constant and ef is a constant dependent on eg and 
Cw but independent of n. Combination of (|37] ). (l42l ) and (l42l ) gives (the only other thing one should observe 
here is that ^((G^z^^)) - J^Zi Gi) > 0) 



,\ 



n+fc 



E G? 



((GTz(2))_^-^G,)2 



> 



i=Cu, + l 



n — c,. 



^ 



(i_^(«))E g g^ (i+4'')'(E((e^zP))-E£iG.))' 



«=c^+l 



n — c,. 



> (l_e-4'''")(i_e-4''"). (44) 



Now, let 



dm,, 



ii+e'rY\ 



n+k 



M\2/ 



(45) 



i=Cii,+l 



n — c„ 



where £3 > is an arbitrarily small constant. Combining (l33T l. (IJTT i. (l40l ). (l44l) . and (1451 ) we have 

P{ min (||G-^z(2)||2)> ||h||2+e^^^\/d^) > (l-e-4'"'™)(l-e-4''^")(l-e-4''")(l-e-4''"). 

z(2)eZG,i/e/?'*'"\o ll^^lb 

(46) 
Further combination of (OTT i. (l32l ). (l33T l. and (l46l ) gives us that if m = niw 

P( min max (a^QA^u-a^z'^^^ + Wuhig-ei^^Vd^)) > 0) > (l-e-4'"''^'»»)(i_e-4'*«)(i_e-4''")(i_e-4''") 

z(2)eZG,i/ei?<*'"\o|la||2=i,Q 

(47) 

Since P{g < e^ y/dn) > 1 — e~'^6 '^^ (where eg is, as all other e's in this paper are, independent of n) 
from (l47b we finally have 

P{ min max (-a^Q^^iz+aV^)) > 0) > (l-e-4'"'^™») (l-e-4''«) (1-6-4"^") (l-e-4'^<^")(l-e-4''"). 

z(2)eZG,i/e/?'*'"\o ||a||2=i,Q 

(48) 
Connecting (l30l ) and (1481 ) we obtain 



P(-t(^) >0) > (1-e 



-e^ drriu 



Js) 



M 



ei'^dn^ 



)(l_e-^2 ")(i_e-^4 ")(i_e-^6 "")(!- e 



-e, TIN 



and ultimately 

lim P{t{A) < O) = lim (1-e" 



M) 



drriu 



-ei'^n 



-4^'n^ 



(g) 



dn\ 



)(l_e-^2 ")(l-e-^4 ")(l-e-^6 ''")(l-e 



(c) 



1 (49) 
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which is what we established as a goal in (|7]). We summarize the results in the following theorem. 

Theorem 3. (Exact weak threshold) Let A be a dm x dn measurement matrix in ([7]) with the null-space 
uniformly distributed in the Grassmanian. Let the unknown x in ([7]) be k-block- sparse with the length of 
its blocks d. Further, let the location and the directions of nonzero blocks ofx be arbitrarily chosen but 
fixed. Let k,m,n be large and let a = ^ and ji^ = -be constants independent ofm and n. Let ^inc{', ■) 
and 7j^c("' ") ^^ ^^^ incomplete gamma function and its inverse, respectively. Further, let all e's below be 
arbitrarily small constants. 

L Let 6w, (f3w < Ow < I) be the solution of 






il-e?)il-^.) ^^-^ ^ . """"^' ^^^ -j2,.-( ^^^r^^^-^ ^)-0. 






(50) 



If a and (3^ further satisfy 



,2 r(^) / i-e^d d±2. 

r(f) i ^'"'^^^"^4-/3^' 2^' 2 



ad>{l- ^^)-^^ ( 1 - 7.nc(7-U^^, S), -^) 1 + /3.d 



(51) 



then with overwhelming probability the solution o/(13) is the k-block- sparse xfrom ([7]). 
2. Let 6w, (Pw < ^«) < Ij ^^ the solution of 

(52) 
7/' a and (3^ further satisfy 






2 



Xl + e 



(9)^ 
3 ^ 



-2 



-) (53) 



then with overwhelming probability there will be a k-block- sparse x (from a set of x's with fixed 
locations and directions of nonzero blocks) that satisfies ([7]) and is not the solution 0/0. 



Proof. The first part was established in 11391 . The second part follows from the previous discussion combin- 
ing ©, &, ©, (ESI), (ED, and (|49l). D 

The above theorem establishes the fundamental characterization of the ^2/^1 performance. Numerical 
values of the weak threshold obtained using (l50l ) and (ISTI ) were presented in [,39.1 . As it was demonstrated 
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there, the lower bounds on the thresholds were in an excellent numerical agreement with the recovery thresh- 
olds that can be obtained through numerical simulations. Theorem ^establishes that the lower bounds com- 
puted in [39 1 (essentially those one can compute from dSOl l and dSTT l) are actually the upper bounds as well 
and as such ai^e the exact values of the weak thresholds. 

For the completeness we present in Figure |2] again the plot obtained based on the ultimate characteriza- 
tion dSnil, dSB- 



Block-sparse weak thresholds as a function of block length d 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

a 
Figure 2: Weak threshold, £2/^1 -optimization — ultimate performance 



5 Discussion 



In this paper we considered under-determined linear systems of equations with sparse solutions. In our 
recent papers we created mechanisms that can be used to analyze almost to perfection the performance 
of a technique called £1 -optimization when used for solving such systems. When presenting those results 
we have mentioned that various generalizations are possible. In this paper we presented a set of such 
generalizations. The results that we presented here relate to a specific type of sparse vectors, namely the 
so-called block-sparse vectors. 

We looked from a theoretical point of view at a classical polynomial-time £2/^1 -optimization algorithm 
that can be used for recovery of such vectors. Such an optimization algorithm is a natural generalization of 
the above mentioned ^1 -optimization that is typically employed when the unknown vectors besides being 
sparse are not known to possess any other type of structure. Under the assumption that the system matrix 
A has i.i.d. standard normal components, we derived upper bounds on the values of the recoverable weak 
thresholds in the so-called linear regime, i.e. in the regime when the recoverable sparsity is proportional 
to the length of the unknown vector. Obtained upper bounds match the corresponding lower bounds we 
found through a framework designed in [39 1. A combination of the mechanism from i39]| and the one that 
we presented in this paper is then enough to provide an explicit ultimate characterization of the success of 
^2/^1 -optimization when applied in solving under-determined systems of linear equations with block-sparse 
solutions. 
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As mentioned in a companion paper 11381 . further developments are then pretty much unlimited. Various 
specific problems that have been of interest in a broad scientific literature developed over the last few years 
can then easily be handled. Examples, include (but of course are not limited to) problems like quantifying 
the performance of ^2/^1 (or even £1) type of optimization problems in solving systems which on top of hav- 
ing block-sparse solutions also possess other types of structured solution vectors (binary, box-constrained, 
partially known locations of nonzero blocks, just to name a few), systems with non-exact (noisy) solution 
vectors and/or equations. In a few forthcoming companion papers we will present some of these applica- 
tions. However, as it will be clear when these results appear, each of them will require some work to put the 
mechanism forth but in essence they all will be fairly simple extensions of what we presented in II381I39II and 
here. The heart of it all will really be the lower-bounding mechanism designed in ||39[I42II and the comple- 
mentary upper-bounding mechanism designed in [38] and in this paper and how the two ultimately meet in 
a nice way. 
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